This is the title

Tony Liang

University of British Columbia

December 31, 2023

About myself

  • PhD student in Bioinformatics under Dr. Amrit Singh supervision

  • BSc. in Math + minor in Data Science

  • What does “bioinformatician” do?

    • Assist researchers like you to better understand what you’re data means
    • We’re just coders that know little bit more bio
  • Currently working in creating pipeline, tools, models analyzing biological data in an automated-fashion

    • Focused on machine learning & AI
    • Reproducible workflows

Single Cell RNA-seq

What is single cell RNA sequencing?

Loading data

The data is from a study (Kang et al. 2018) and publicly avaliable through R’s ExperimentHub function

eh <- ExperimentHub() #  Initialize the hub as some list object
sce <- eh[["EH2259"]] # We could then extract the match entry by taking this entry from out hub
# Then print it
sce
class: SingleCellExperiment 
dim: 35635 29065 
metadata(0):
assays(1): counts
rownames(35635): MIR1302-10 FAM138A ... MT-ND6 MT-CYB
rowData names(2): ENSEMBL SYMBOL
colnames(29065): AAACATACAATGCC-1 AAACATACATTTCC-1 ... TTTGCATGGTTTGG-1
  TTTGCATGTCTTAC-1
colData names(5): ind stim cluster cell multiplets
reducedDimNames(1): TSNE
mainExpName: NULL
altExpNames(0):
  • After loaded data should inspect basic information
  • What are rows? column? size of data?
  • Rows = genes
  • Columns = cells

Preprocessing of data before analysis

The data retrieved is rawest form, not all of it is useful.

Caveats:

  • Undetected genes
  • Cells with very few or many detected genes
  • Lowly expressed genes
  • unnormalized expression values

This is usually the quality control (QC) step. This could potentially be another tutorial, so not deeply covered today.

We only perfom simple actions

Remove undetected genes

counts(sce) |> colnames() |> sample(10)
 [1] "GAAGTAGACTGTAG-1" "ATAAGTTGGTGCTA-1" "GGTACATGCTATGG-1" "GAGTCTGATTCGGA-1"
 [5] "AACAGCACTCTTTG-1" "TCCCGATGCCCTTG-1" "ACTCCTCTTAGAAG-1" "GTCCCATGTCAAGC-1"
 [9] "ACGTAGACCAGTCA-1" "TGGTATCTCACTAG-1"

Reference

Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. 2018. “Multiplexed Droplet Single-Cell RNA-Sequencing Using Natural Genetic Variation.” Nature Biotechnology 36 (1): 89–94.